Discussion on Response Time

Learn to estimate serial and parallel response time, as well as understand the optimization techniques.

Calculate response time using parallel processing#

Let’s use the equation for response time with which we are already acquainted:

Timeresponse=Timelatency+Timeprocessing(1)Time_{response} = Time_{latency} + Time_{processing} \qquad \qquad (1)

In parallel processing, the API gateway communicates with all the subservices simultaneously, as shown in the following illustration:

Parallel processing from API gateway to downstream services
Parallel processing from API gateway to downstream services

We’ll use the processing time to calculate the response time of an API. For that, recall the latency numbers we estimated in the latency lesson to measure the response time of GET and POST requests. Let’s populate the numbers in equation (1):

  • Response time for a GET request=331.42 ms+4 ms=335.42 ms= 331.42\ ms + 4\ ms = 335.42\ ms

  • Response time for a POST request=805.48 ms+4 ms=809.48 ms= 805.48\ ms + 4\ ms = 809.48\ ms

The response time is significantly reduced on subsequent requests when the base time is omitted after using a cached response:

  • Response time for a GET request =128.32 ms+4 ms=132.32 ms= 128.32\ ms + 4\ ms = 132.32\ ms

  • Response time for a POST request =542.94 ms+4 ms=546.94 ms= 542.94\ ms + 4\ ms = 546.94\ ms

Note: The 4 ms processing time indicates the parallel processing time of the services (as shown in the illustration) and in ideal cases, it remains the same.

Calculate response time using serial processing#

Let's suppose the API gateway communicates serially with all the subservices (one after the other); in that case, the processing time will be the sum of all the times taken by subservices.

Serial processing from API gateway to downstream services
Serial processing from API gateway to downstream services

According to the illustration above, each service provider’s processing time will be 4 ms. The total processing time for all service providers is given below:

  Processing time =4 ms + 4 ms + 4 ms=12 ms= 4\ ms\ +\ 4\ ms\ +\ 4\ ms = 12\ ms

The response time is calculated by putting the values of latency and processing time in equation (1):

  • Response time for a GET request =331.42 ms+12 ms=343.42 ms= 331.42\ ms + 12\ ms = 343.42\ ms

  • Response time for a POST request =805.48 ms+12 ms=817.48 ms= 805.48\ ms + 12\ ms = 817.48\ ms

In the case of cached information, the response time is given as follows:

  • Response time for a GET request =128.32 ms+12 ms=140.32 ms= 128.32\ ms + 12\ ms = 140.32\ ms

  • Response time for a POST request =542.94 ms+12 ms=554.94 ms= 542.94\ ms + 12\ ms = 554.94\ ms

These response time numbers are purely network-dependent and can vary greatly. Let’s discuss the optimization techniques in the subsequent section.

Discussion#

The latency, processing, and response times we estimated in this chapter are obtained through the API testing tool (Postman) or gauged using standard latencies of computational and networking devices. Through some practical experiments and theoretical formulations, we have estimated ranges of response times observed routinely in different phases of the lifecycle of an API call.

However, these numbers are not definite, and depend on various parameters. We merely performed back-of-the-envelope calculations to improve our understanding and pave the way for estimating a latency and processing budget for the design problems we aim to solve. In some cases, the system’s complexity will lead to a higher response time. In such cases, we can perform several optimization techniques, as detailed below.

Response time optimization#

An API response with an average of below 200 milliseconds is defined as an instant response, and the service is categorized broadly as a real-time service. In general, a response range varies between 0.1 and 1 second. If the response isn't in this range, the customer’s satisfaction is at risk, and the API needs optimizations.

The following factors of an API play a vital role in determining its response time:

  • Optimized network: In many cases, a high response time is due to a high latency time because of the network. It isn’t easy to manage this factor, as the services employ shared mediums for communication over the Internet. A service might need to employ edge data centers near their customers to reduce network latency. An example of such a mechanism is Netflix's streaming servers, which reside inside many ISPs to provide high-bandwidth and low-latency service to customers. Another example is the hyper scalers, such as Google, that have deployed its private wide-area network for low latency over long distances. Traffic usually moves from Google's network to the public Internet near the customer (probably at an IXP exchange or ISP). Additionally, as a service, we’ll need to see if bandwidth is the issue, or if it’s the latency (and every once in a while, both are the issue).

  • Optimized database: The query execution time can significantly affect the response time. The slow queries are the main reason for a reduced response time. An optimized query and schema in relational databases can help reduce the query execution time. Apart from queries, data storage technologies also play a vital role in storing and retrieving data.

  • Prefetch data: In some cases, we can prefetch frequently used data from the database, anticipating its request from users. It is best to prefetch such data that may be available publicly to the users by services to optimize response time.

  • Compress media files: Generally, the response of APIs takes more time to reach the client if large files are being sent over the network. If the machines on both ends are powerful enough, it is best to compress and/or encode more extensive data before sending it over the network.

  • CDN assistance: If some of the data is frequently fetched by many users, it is best to serve it from a nearby CDN. Usually, the data related to media, such as images or videos, is fetched using CDN.

  • Use of API monitoring tools: In certain cases, the cause of the delayed response is not apparent. API monitoring tools can help to identify the root cause of laggy APIs.

  • Effective bot management: The world is full of bots sending countless requests to service providers. Implementing effective bot management techniques will help us avoid entertaining useless requests.

  • Appropriate hosting service: Hosting a service can affect the response time in the case of small services with shared hosting.

  • Data centers with optimized resources: The data centers should have optimized resources depending on the nature of the enterprise solutions. Moreover, targeting the users, the data centers should be at the nearest locations.

Note: The GET request can be faster than a POST request because of the following reasons:

  • The GET requests can be cached, POST can't be cached.

  • For GET, values are sent in a header, whereas, for POST data is sent in the body of the request.

  • Only ASCII characters are allowed in GET, whereas POST can use binary data types as well.

  • The POST request uses multipart/form-data encoding for binary data (file upload).

Summary#

We’ve estimated the latency and processing time of both GET and POST requests. From these estimations, we have concluded that we’ll use the following reference numbers of base time, RTT, and download time to estimate latency in our design problems later in this course:

Reference Numbers

Request Type

Timebase

RTT

Timedownload

Timeprocessing

GET

Minimum time = 120.5 ms

Maximum time = 201.5 ms

70 ms

0.4 * size of response

Minimum time = 4 ms


Maximum time = variable


POST

260 ms + 1.15 * size of request

1.7 ms

Note: The processing time can be 12 ms, 23 ms, or69 ms, (it depends on the processing,  the number of services, the distance between them, and what type of operation they perform) and can vary accordingly.

The above numbers can vary during experiments due to different factors (mainly due to the network), so we’ll provide the opportunity to change these numbers in calculators (by making them input fields) to find the response time of an API in future lessons.

Quiz#

Let’s suppose a user interacts on a social media platform where posts with media files such as images and videos are displayed. The HTTP requests are sent to fetch the content to display to a user. Considering this scenario, answer the following quiz questions:

Quiz

3

(Select all that apply.) What strategie(s) can reduce the processing time when fetching large amounts of data from an API?

Selected Option
A)

Optimizing database

Explanation

Yes, we can optimize the database to fasten the query execution.

B)

Optimizing bandwidth

Explanation

Bandwidth optimization does not lie in processing time’s scope

Not Selected
C)

Placing servers within a zone or region.

Explanation

Placing servers at the nearest locations can greatly reduce processing time, as we discovered while estimating processing time.

Question 3 of 33 attempted

The Estimation of Response Time of an API

The REDCAMEL Approach for Designing APIs